Bayesian Q-Learning
نویسندگان
چکیده
A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information—the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requires an assessment of the agent’s uncertainty about its current value estimates for states. In this paper, we adopt a Bayesian approach to maintaining this uncertain information. We extend Watkins’ Q-learning by maintaining and propagating probability distributions over the Q-values. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish the convergence properties of our algorithm and show experimentally that it can exhibit substantial improvements over other well-known model-free exploration strategies.
منابع مشابه
Bayesian Hypernetworks
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neural network which learns to transform a simple noise distribution, p( ) = N (0, I), to a distribution q(θ) . = q(h( )) over the parameters θ of another neural network (the "primary network"). We train q with variational inference, using an invertible h to ena...
متن کاملBayesian Deep Q-Learning via Continuous-Time Flows
Efficient exploration in reinforcement learning (RL) can be achieved by incorporating uncertainty into model predictions. Bayesian deep Q-learning provides a principle way for this by modeling Q-values as probability distributions. We propose an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient ...
متن کاملBayesian Ying - Yang system , best harmony learning , and five action circling
Firstly proposed in 1995 and systematically developed in the past decade, Bayesian YingYang learning is a statistical approach for a two pathway featured intelligent system via two complementary Bayesian representations of a joint distribution on the external observation X and its inner representation R, which can be understood from a perspective of the ancient Ying-Yang philosophy. We have q(X...
متن کاملEfficient Exploration through Bayesian Deep Q-Networks
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...
متن کاملBayesian Q-learning with Assumed Density Filtering
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesia...
متن کاملUncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats
The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities o...
متن کامل